Freshers / Beginner level questions
Freshers / Beginner level questions & answers
Ques 1. What is PySpark?
PySpark is the Python API for Apache Spark, a fast and general-purpose cluster computing system.
Example:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('example').getOrCreate()
Ques 2. Explain the purpose of the 'groupBy' operation in PySpark.
'groupBy' is used to group the data based on one or more columns. It is often followed by aggregation functions to perform operations on each group.
Example:
grouped_data = df.groupBy('Category').agg({'Price': 'mean'})
Ques 3. Explain the concept of a SparkSession in PySpark.
SparkSession is the entry point to any PySpark functionality. It is used to create DataFrames, register DataFrames as tables, and execute SQL queries.
Example:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('example').getOrCreate()
Ques 4. Explain the purpose of the 'collect' action in PySpark.
The 'collect' action retrieves all elements of a distributed dataset (RDD or DataFrame) and brings them to the driver program.
Example:
data = df.collect()
Ques 5. How can you perform a union operation on two DataFrames in PySpark?
You can use the 'union' method to combine two DataFrames with the same schema.
Example:
result = df1.union(df2)
Ques 6. What is the purpose of the 'groupBy' operation in PySpark?
'groupBy' is used to group the data based on one or more columns. It is often followed by aggregation functions to perform operations on each group.
Example:
grouped_data = df.groupBy('Category').agg({'Price': 'mean'})
Ques 7. How can you create a temporary view from a PySpark DataFrame?
You can use the 'createOrReplaceTempView' method to create a temporary view from a PySpark DataFrame.
Example:
df.createOrReplaceTempView('temp_view')
Ques 8. What is the purpose of the 'orderBy' operation in PySpark?
'OrderBy' is used to sort the rows of a DataFrame based on one or more columns.
Example:
result = df.orderBy('column')
Most helpful rated by users:
Related interview subjects
Python Pandas interview questions and answers - Total 48 questions |
Python Matplotlib interview questions and answers - Total 30 questions |
Django interview questions and answers - Total 50 questions |
Pandas interview questions and answers - Total 30 questions |
Deep Learning interview questions and answers - Total 29 questions |
PySpark interview questions and answers - Total 30 questions |
Flask interview questions and answers - Total 40 questions |
PyTorch interview questions and answers - Total 25 questions |
Data Science interview questions and answers - Total 23 questions |
SciPy interview questions and answers - Total 30 questions |
Generative AI interview questions and answers - Total 30 questions |
NumPy interview questions and answers - Total 30 questions |
Python interview questions and answers - Total 106 questions |